QSAR Analysis with Support Vector Machines and Graph Kernels
نویسندگان
چکیده
Kernel methods, such as support vector machines, have been applied to solving various problems in bioinformatics. Recently, marginalized kernels between labeled graphs have been proposed [2, 3], which enable the application of kernel methods to the analysis and classification of chemical compounds such as QSAR (quantitative structure-activity relationship). These graph kernels are based on the detection of common paths between different graphs. These correspond to a dot product between the graphs mapped to an infinite-dimensional feature space, but can be computed in polynomial time with respect to the graph sizes. Encouraging experimental results suggest that this approach might be useful in analysis of chemical compounds. These graph kernels, however, are subject to several limitations. First, the choice of representing implicitly each graph by the set of path probabilities under a simple random walk model might be questioned. In chemoinformatics, subgraphs are believed to be more relevant features than paths. Moreover, the random walk model used is subject to “tottering”, in the sense that it can move to one direction and instantly come back to its original position, resulting in redundant paths which might decrease the characterization of a given graph once mapped to the feature space of these graph kernels. Second, the graph kernel has a computational complexity roughly proportional to the product of the sizes of the two graphs to be compared, which results in slow implementation for real-world problem.
منابع مشابه
A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملA comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملکاربرد الگوریتمهای دادهکاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد
Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004